Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

نویسندگان

چکیده

Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 14 April 2020Accepted: 01 March 2021Published online: 05 October 2021Keywordstemporal difference learning, Polyak--Ruppert averaging, variance reductionAMS Subject Headings68Q25, 68R10, 68U05Publication DataISSN (online): 2577-0187Publisher: Society for Industrial and Applied MathematicsCODEN: sjmdaq

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Temporal-Difference Learning

We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a ch...

متن کامل

Instance Optimal Learning

We consider the following basic learning task: given independent draws from an unknowndistribution over a discrete support, output an approximation of the distribution that is as ac-curate as possible in `1 distance (equivalently, total variation distance, or “statistical distance”).Perhaps surprisingly, it is often possible to “de-noise” the empirical distribution of the samples<lb...

متن کامل

An Analysis of Experience Replay in Temporal Difference Learning

متن کامل

An Analysis of Temporal-Difference Learning with Function Approximation

We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator online during a single endless trajectory of an irreducible aperiodic Markov chain with a finite or infinite state space. We present a proof of convergence (with pr...

متن کامل

Bayes Optimal Instance-Based Learning

In this paper we present a probabilistic formalization of the instance-based learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a data-driven instance-based learning approach, is equivalent to averaging over all the (possibly innnitely many) individual models. The general Bayesian instance-based learning framework described in this paper can ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SIAM journal on mathematics of data science

سال: 2021

ISSN: ['2577-0187']

DOI: https://doi.org/10.1137/20m1331524